AI Model

# AI Model

F Lite

F Lite is a large diffusion model developed by Freepik and Fal with 10 billion parameters, specifically trained on copyright-safe and Suitable For Work (SFW) content. The model is based on Freepik's internal dataset, which contains about 80 million legally compliant images, marking the first time that publicly available models at this scale have focused on legal and safe content. Its technical report provides detailed information about the model and it is distributed under the CreativeML Open RAIL-M license. The design of the model aims to promote the openness and accessibility of AI.

Image Generation

Photogen by AI

Photogen by AI is a platform that quickly generates high-quality photos via AI. Users can upload their selfie photos and use AI models to transform them into professional portraits. Prices are divided into three tiers: Hobby, Pro, and Enterprise.

Image Generation

GAIA-2

GAIA-2 is an advanced video generation model developed by Wayve, designed to provide diverse and complex driving scenarios for autonomous driving systems to improve safety and reliability. The model addresses the limitations of relying on real-world data collection by generating synthetic data, capable of creating various driving situations, including both regular and edge cases. GAIA-2 supports the simulation of various geographical and environmental conditions, helping developers quickly test and verify autonomous driving algorithms without high costs.

Video Production

CogView4

CogView4 is an advanced text-to-image generation model developed by Tsinghua University. Based on diffusion model technology, it can generate high-quality images based on text descriptions. It supports both Chinese and English input and can generate high-resolution images. The main advantages of CogView4 are its strong multilingual support and high-quality image generation capabilities, making it suitable for users who need to efficiently generate images. The model was presented at ECCV 2024 and has significant research and application value.

Image Generation

hunyuan-video-keyframe-control-lora

Hunyuan Video Keyframe Control Lora

HunyuanVideo Keyframe Control LoRA is an adapter for the HunyuanVideo T2V model, focusing on keyframe video generation. It modifies the input embedding layer to effectively integrate keyframe information and applies Low-Rank Adaptation (LoRA) technology to optimize linear and convolutional input layers, enabling efficient fine-tuning. This model allows users to precisely control the starting and ending frames of the generated video by defining keyframes, ensuring seamless integration with the specified keyframes and enhancing video coherence and narrative. It has significant application value in video generation, particularly excelling in scenarios requiring precise control over video content.

Video Production

olmOCR-7B-0225-preview

Olmocr 7B 0225 Preview

olmOCR-7B-0225-preview is an advanced document recognition model developed by the Allen Institute for AI. It aims to rapidly convert document images into editable plain text through efficient image processing and text generation techniques. Fine-tuned from Qwen2-VL-7B-Instruct, it combines powerful visual and language processing capabilities, suitable for large-scale document processing tasks. Its key advantages include high processing efficiency, accurate text recognition, and flexible prompt generation. This model is intended for research and educational use, is licensed under the Apache 2.0 license, and emphasizes responsible use.

Phi-4-multimodal-instruct

Phi 4 Multimodal Instruct

Phi-4-multimodal-instruct is a multimodal foundational model developed by Microsoft, supporting text, image, and audio inputs to generate text outputs. Built upon the research and datasets of Phi-3.5 and Phi-4.0, the model has undergone supervised fine-tuning, direct preference optimization, and reinforcement learning from human feedback to improve instruction following and safety. It supports multilingual text, image, and audio inputs, features a 128K context length, and is applicable to various multimodal tasks such as speech recognition, speech translation, and visual question answering. The model demonstrates significant improvements in multimodal capabilities, particularly excelling in speech and vision tasks. It provides developers with powerful multimodal processing capabilities for building a wide range of multimodal applications.

Kimi Latest

kimi-latest is the latest AI model launched by Moonshot AI, synchronously upgraded with the Kimi intelligent assistant. It has powerful context processing capabilities and automatic caching functions, which can effectively reduce usage costs. The model supports image understanding and multiple functions such as ToolCalls and web search, making it suitable for building AI-powered intelligent assistants or customer service systems. Priced at ￥1 per million tokens, it is positioned as an efficient and flexible AI model solution.

Magic 1-For-1

Magic 1-For-1 focuses on efficient video generation, with its core feature being the rapid conversion of text and images into video. The model optimizes memory usage and reduces inference latency by breaking the text-to-video generation task into two sub-tasks: text-to-image and image-to-video. Key advantages include efficiency, low latency, and scalability. Developed by the DA-Group team at Peking University, the model aims to advance the interactive foundational video generation field. The model and related code are open-source and available for free use, subject to compliance with the open-source license agreement.

Video Production

Animagine XL 4.0

Animagine XL 4.0

Animagine XL 4.0 is an anime-themed generation model fine-tuned from Stable Diffusion XL 1.0. It has been trained on 8.4 million diverse anime-style images for a total of 2650 hours. The model focuses on generating and modifying anime-themed images based on text prompts, supporting various special tags to control different aspects of image generation. Its main advantages include high-quality image generation, rich anime style details, and precise restoration of specific characters and styles. The model was developed by Cagliostro Research Lab and is licensed under CreativeML Open RAIL++-M, allowing for commercial use and modification.

Image Generation

Confucius-o1-14B

Confucius O1 14B

Confucius-o1-14B is an inference model developed by the NetEase Youdao team, optimized based on Qwen2.5-14B-Instruct. It employs a two-stage learning strategy that automatically generates reasoning chains and summarizes step-by-step problem-solving processes. This model is aimed at the education field, particularly suitable for K12 math problems, helping users quickly acquire correct problem-solving strategies and answers. Its lightweight nature allows it to be deployed on a single GPU without quantization, reducing the barrier to use. Its reasoning capabilities have demonstrated outstanding performance in internal evaluations, providing robust technical support for AI applications in education.

Codestral 25.01

Codestral 25.01

Codestral 25.01 is an advanced programming assistance model introduced by Mistral AI, representing cutting-edge technology in the field of programming models. This model is lightweight, fast, and proficient in over 80 programming languages, optimized for low-latency, high-frequency usage scenarios. It supports various tasks such as code completion (FIM), code correction, and test generation. With improvements in architecture and tokenization, the speed of code generation and completion is approximately twice as fast as its predecessors, making it a leader in programming tasks, particularly excelling in FIM use cases. Its main advantages include an efficient architecture, rapid code generation capabilities, and fluency in multiple programming languages, significantly enhancing developers' coding efficiency. Codestral 25.01 is currently available to developers worldwide through IDE/IDE plugin partners like Continue.dev, and supports local deployment to meet enterprise data and model residency requirements.

Coding Assistant

OpenAI o1 API

OpenAI o1 is a high-performance AI model aimed at tackling complex multi-step tasks with superior accuracy. It is the successor to o1-preview and has been utilized to build agent applications that simplify customer support, optimize supply chain decisions, and forecast intricate financial trends. The o1 model encompasses production-ready features such as function calling, structured output, developer messages, and visual capabilities. The version o1-2024-12-17 has achieved new high scores in multiple benchmarks, enhancing cost efficiency and performance.

FastHunyuan

FastHunyuan is an accelerated version of the HunyuanVideo model developed by Hao AI Lab, capable of generating high-quality videos in just 6 diffusion steps, which is approximately 8 times faster than the original HunyuanVideo model that required 50 steps. The model underwent consistency distillation training on the MixKit dataset, ensuring it is efficient and high-quality, suitable for scenarios requiring quick video production.

Video Production

RWKV-6 Finch 7B World 3

RWKV 6 Finch 7B World 3

RWKV-6 Finch 7B World 3 is an open-source artificial intelligence model featuring 7 billion parameters and trained on 3.1 trillion multilingual tokens. Renowned for its environmentally friendly design and high performance, it aims to provide high-quality open-source AI solutions for users worldwide, regardless of nationality, language, or economic status. The RWKV architecture is designed to minimize environmental impact, with fixed power consumption per token that is independent of context length.

Universal-2

Universal-2 is the latest speech recognition model launched by AssemblyAI, surpassing the previous Universal-1 in both accuracy and precision. It captures the complexities of human language more effectively, providing users with audio data that requires no secondary verification. The significance of this technology lies in its ability to deliver sharper insights, faster workflows, and an exceptional product experience. Universal-2 features notable improvements in proper noun recognition, text formatting, and alphanumeric recognition, consequently reducing word error rates in practical applications.

Speech Recognition

Pixtral 12B

Pixtral 12B is a multimodal AI model developed by the Mistral AI team. It comprehends natural images and documents, showcasing exceptional capabilities in multimodal task processing while also maintaining state-of-the-art performance in text benchmarks. The model supports various image sizes and aspect ratios and can process an arbitrary number of images within a long context window. It is an upgraded version of Mistral Nemo 12B, specifically designed for multimodal inference without sacrificing critical text processing abilities.

FLUX.1-dev-Controlnet-Inpainting-Alpha

FLUX.1 Dev Controlnet Inpainting Alpha

FLUX.1-dev-Controlnet-Inpainting-Alpha is an AI image restoration model released by the AlimamaCreative Team, specifically developed to repair and fill in missing or damaged areas of images. This model performs optimally at a resolution of 768x768, delivering high-quality image restoration. As an alpha version, it showcases advanced technology in the field of image restoration and is expected to provide even better performance with further training and optimization.

AI Image Restoration

Hyper FLUX 8Steps LoRA

Hyper FLUX 8Steps LoRA

Hyper FLUX 8Steps LoRA is an AI model developed by ByteDance, based on LoRA technology, aimed at improving the efficiency and effectiveness of model training. It simplifies the model architecture and reduces training steps while maintaining or enhancing model performance, providing researchers and developers with an efficient and user-friendly solution.

flux-ip-adapter

Flux Ip Adapter

The flux-ip-adapter is an image generation adapter developed by Black Forest Labs, based on the FLUX.1-dev model. This model is trained to support image generation at resolutions of 512x512 and 1024x1024, with regular releases of new checkpoints. It is primarily designed for ComfyUI, a user interface design tool that allows integration through custom nodes. The product is currently in beta testing, and users may need to experiment multiple times to achieve optimal results.

AI image generation

Flux1.dev-AsianFemale

Flux1.dev AsianFemale

Flux1.dev-AsianFemale is an experimental Low-Rank Adaptation (LoRA) model based on the Flux.1 D model, designed to explore training methods that shift the default female imagery of the Flux model toward Asian features. This model has not undergone facial beautification or celebrity face training, making it experimental with potential training issues and challenges.

AI image generation

Diffree

Diffree is a text-guided image restoration model capable of adding new objects to images based on text descriptions while maintaining background consistency, spatial appropriateness, and the quality and relevance of the objects. This model was trained on the OABench dataset, utilizing a stable diffusion model along with an additional mask prediction module, enabling it to uniquely predict the locations of new objects for text-guided object addition.

AI Image Editing

Swapper

Swapper is an AI-driven fashion model and e-commerce assistant designed to help businesses save costs with high-quality AI video generation technology. It offers professional AI fashion models to meet various modeling needs, significantly reducing modeling fees and promoting profit growth. Additionally, Swapper can freely switch shooting scenes in different scenarios, reducing the shooting cycle and saving budget. Swapper's main functions include product commercial auctions, color changes on clothing, and more, enabling efficient and accurate fulfillment of design needs while reducing the cost of repeated shooting.

AI design tools

Paints-UNDO

Paints-UNDO is a project aiming to provide a foundational model for human painting behavior, hoping that future AI models can better serve the genuine needs of human artists. The project name 'Paints-Undo' is inspired by the model's output, which appears as if repeatedly pressing the 'undo' button (commonly Ctrl+Z) in a digital painting software.

AI image generation

X Model

X Model is a platform that integrates popular mainstream AI models, allowing users to easily access these models in their products. Its main advantages include a variety of model choices, high-quality output results, and a simple and easy-to-use integration process. X Model offers flexible pricing, suitable for businesses of all sizes.

Development Platform

InstantStyle-Plus

Instantstyle Plus

InstantStyle-Plus is an advanced image generation model that focuses on achieving style transfer during text-to-image generation while maintaining the integrity of the original content. It decomposes the style transfer task into three sub-tasks: style injection, spatial structure preservation, and semantic content preservation. Using the InstantStyle framework, it achieves style injection in an efficient and lightweight manner. The model maintains spatial composition through the inversion of content latent noise and the use of Tile ControlNet. It also enhances semantic content fidelity through a global semantic adapter. Additionally, a style extractor is used as a discriminator, providing supplementary style guidance. The main advantage of InstantStyle-Plus lies in its ability to achieve a harmonious union of style and content without sacrificing content integrity.

AI image generation

Claude 3.5 Sonnet

Claude 3.5 Sonnet

Claude 3.5 Sonnet, developed by Anthropic, strikes a remarkable balance between intelligence, speed, and cost. This model sets new industry benchmarks in graduate-level reasoning, undergraduate-level knowledge, and programming proficiency. It excels at understanding nuances, humor, and complex instructions, and can generate high-quality content in a natural and friendly tone. Additionally, it demonstrates strong capabilities in visual reasoning, chart interpretation, and image-to-text transcription, making it an ideal choice for industries like retail, logistics, and financial services.

Utopia

Utopia is an individualized character creation platform dedicated to fostering the next generation of ultra-anthropomorphic AI intelligent entities. Its main advantages include greater control, anthropomorphism, and security. Background information reveals that the product emphasizes user participation in creation, focusing on delivering highly personalized character models.

AI Color Generation

Samba-1 Turbo

Samba-1 Turbo is a platform that offers AI model selection and application, allowing developers to try, compare, and evaluate various expert models through its free developer inference service. Additionally, the platform provides several demo business applications built upon Samba-1, as well as the open-source language expert SambaLingo. Samba-1 Turbo aims to empower developers with powerful tools to simplify the integration and application of AI models.

Development Platform

OpenAI & other LLM API Pricing Calculator

Openai & Other LLM API Pricing Calculator

A cost calculator for OpenAI and other large language model (LLM) APIs, helping businesses and developers evaluate and compare the costs of different AI models in their projects. The tool provides price calculations for multiple models, including OpenAI, Azure, Anthropic, Llama 3, Google Gemini, Mistral, and Cohere. It calculates costs based on input tokens, output tokens, and API call frequency.

AI tools website directory

Featured AI Tools

Jules AI

Jules は、自動で煩雑なコーディングタスクを処理し、あなたに核心的なコーディングに時間をかけることを可能にする異步コーディングエージェントです。その主な強みは GitHub との統合で、Pull Request(PR) を自動化し、テストを実行し、クラウド仮想マシン上でコードを検証することで、開発効率を大幅に向上させています。Jules はさまざまな開発者に適しており、特に忙しいチームには効果的にプロジェクトとコードの品質を管理する支援を行います。

開発プログラミング

NoCode

NoCode はプログラミング経験を必要としないプラットフォームで、ユーザーが自然言語でアイデアを表現し、迅速にアプリケーションを生成することが可能です。これにより、開発の障壁を下げ、より多くの人が自身のアイデアを実現できるようになります。このプラットフォームはリアルタイムプレビュー機能とワンクリックデプロイ機能を提供しており、技術的な知識がないユーザーにも非常に使いやすい設計となっています。

開発プラットフォーム

ListenHub

ListenHub は軽量級の AI ポッドキャストジェネレーターであり、中国語と英語に対応しています。最先端の AI 技術を使用し、ユーザーが興味を持つポッドキャストコンテンツを迅速に生成できます。その主な利点には、自然な会話と超高品質な音声効果が含まれており、いつでもどこでも高品質な聴覚体験を楽しむことができます。ListenHub はコンテンツ生成速度を改善するだけでなく、モバイルデバイスにも対応しており、さまざまな場面で使いやすいです。情報取得の高効率なツールとして位置づけられており、幅広いリスナーのニーズに応えています。

腾讯混元画像 2.0

腾讯混元画像 2.0

腾讯混元画像 2.0 は腾讯が最新に発表したAI画像生成モデルで、生成スピードと画質が大幅に向上しました。超高圧縮倍率のエンコード?デコーダーと新しい拡散アーキテクチャを採用しており、画像生成速度はミリ秒級まで到達し、従来の時間のかかる生成を回避することが可能です。また、強化学習アルゴリズムと人間の美的知識の統合により、画像のリアリズムと詳細表現力を向上させ、デザイナー、クリエーターなどの専門ユーザーに適しています。

OpenMemory MCP

OpenMemoryはオープンソースの個人向けメモリレイヤーで、大規模言語モデル（LLM）に私密でポータブルなメモリ管理を提供します。ユーザーはデータに対する完全な制御権を持ち、AIアプリケーションを作成する際も安全性を保つことができます。このプロジェクトはDocker、Python、Node.jsをサポートしており、開発者が個別化されたAI体験を行うのに適しています。また、個人情報を漏らすことなくAIを利用したいユーザーにお勧めします。

オープンソース

FastVLM

FastVLM は、視覚言語モデル向けに設計された効果的な視覚符号化モデルです。イノベーティブな FastViTHD ミックスドビジュアル符号化エンジンを使用することで、高解像度画像の符号化時間と出力されるトークンの数を削減し、モデルのスループットと精度を向上させました。FastVLM の主な位置付けは、開発者が強力な視覚言語処理機能を得られるように支援し、特に迅速なレスポンスが必要なモバイルデバイス上で優れたパフォーマンスを発揮します。

ピカは、ユーザーが自身の創造的なアイデアをアップロードすると、AIがそれに基づいた動画を自動生成する動画制作プラットフォームです。主な機能は、多様なアイデアからの動画生成、プロフェッショナルな動画効果、シンプルで使いやすい操作性です。無料トライアル方式を採用しており、クリエイターや動画愛好家をターゲットとしています。

LiblibAI

LiblibAIは、中国をリードするAI創作プラットフォームです。強力なAI創作能力を提供し、クリエイターの創造性を支援します。プラットフォームは膨大な数の無料AI創作モデルを提供しており、ユーザーは検索してモデルを使用し、画像、テキスト、音声などの創作を行うことができます。また、ユーザーによる独自のAIモデルのトレーニングもサポートしています。幅広いクリエイターユーザーを対象としたプラットフォームとして、創作の機会を平等に提供し、クリエイティブ産業に貢献することで、誰もが創作の喜びを享受できるようにすることを目指しています。

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase